Distributed Data Clustering
نویسنده
چکیده
To make effective use of distributed information, it is desirable to allow coordination and collaboration among various information sources. This paper deals with clustering data emanating from different sites. The process of clustering consists of three steps: find the (local) clusters of data at each site; find (higher) clusters from the union of the distributed data sets at the central site; and finally compute the associations between the two sets of clusters. The approach aims at discovering the hidden structure of a multi-source data and assigning unseen data points coming from a site to the right higher cluster without any need to access their feature values. The proposed approach is evaluated experimentally.
منابع مشابه
Entropy-based Consensus for Distributed Data Clustering
The increasingly larger scale of available data and the more restrictive concerns on their privacy are some of the challenging aspects of data mining today. In this paper, Entropy-based Consensus on Cluster Centers (EC3) is introduced for clustering in distributed systems with a consideration for confidentiality of data; i.e. it is the negotiations among local cluster centers that are used in t...
متن کاملDISTRIBUTED AND COLLABORATIVE FUZZY MODELING
In this study, we introduce and study a concept of distributed fuzzymodeling. Fuzzy modeling encountered so far is predominantly of a centralizednature by being focused on the use of a single data set. In contrast to this style ofmodeling, the proposed paradigm of distributed and collaborative modeling isconcerned with distributed models which are constructed in a highly collaborativefashion. I...
متن کاملImproving Imbalanced data classification accuracy by using Fuzzy Similarity Measure and subtractive clustering
Classification is an one of the important parts of data mining and knowledge discovery. In most cases, the data that is utilized to used to training the clusters is not well distributed. This inappropriate distribution occurs when one class has a large number of samples but while the number of other class samples is naturally inherently low. In general, the methods of solving this kind of prob...
متن کاملOutlier Detection in Wireless Sensor Networks Using Distributed Principal Component Analysis
Detecting anomalies is an important challenge for intrusion detection and fault diagnosis in wireless sensor networks (WSNs). To address the problem of outlier detection in wireless sensor networks, in this paper we present a PCA-based centralized approach and a DPCA-based distributed energy-efficient approach for detecting outliers in sensed data in a WSN. The outliers in sensed data can be ca...
متن کاملDistributed Balanced Clustering via Mapping Coresets
Large-scale clustering of data points in metric spaces is an important problem in mining big data sets. For many applications, we face explicit or implicit size constraints for each cluster which leads to the problem of clustering under capacity constraints or the “balanced clustering” problem. Although the balanced clustering problem has been widely studied, developing a theoretically sound di...
متن کاملA Distributed and Parallel Clustering Algorithm for Massive Biological Data
Distributed processing today is a largely advantageous technology of bridging together a system of multiple computers and processor systems in running applications. The concept of Distributed processing has allowed time cutting and therefore reduction in costs. Using this, we aim to address clustering techniques in developing new method for further reduction in time and costs. The problem of cl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003